Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation

نویسندگان

چکیده

The numerical value of k in a k-fold cross-validation training technique machine learning predictive models is an essential element that impacts the model’s performance. A right choice results better accuracy, while poorly chosen for might affect In literature, most commonly used values are five (5) or ten (10), as these two believed to give test error rate estimates suffer neither from extremely high bias nor very variance. However, there no formal rule. To best our knowledge, few experimental studies attempted investigate effect diverse different models. This paper empirically analyses prevalence and distinct (3, 5, 7, 10, 15 20) on validation performance four well-known algorithms (Gradient Boosting Machine (GBM), Logistic Regression (LR), Decision Tree (DT) K-Nearest Neighbours (KNN)). It was observed model differ one machine-learning algorithm another same classification task. empirical suggest = 7 offers slight increase validations accuracy area under curve measure with lesser computational complexity than 10 across MLA. We discuss detail study outcomes outline some guidelines beginners field selecting given

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

n-fold Commutative Hyper K-ideals

In this paper, we aresupposed to introduce the definitions of n-fold commutative, andimplicative hyper K-ideals. These definitions are thegeneralizations of the definitions of commutative, andimplicative hyper K-ideals, respectively, which have been definedin [12]. Then we obtain some related results. In particular wedetermine the relationships between n-fold implicative hyperK-ideal and n-fol...

متن کامل

The 'K' in K-fold Cross Validation

The K-fold Cross Validation (KCV) technique is one of the most used approaches by practitioners for model selection and error estimation of classifiers. The KCV consists in splitting a dataset into k subsets; then, iteratively, some of them are used to learn the model, while the others are exploited to assess its performance. However, in spite of the KCV success, only practical rule-of-thumb me...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

Multi-K Machine Learning Ensembles

Ensemble machine learning models often surpass single models in classification accuracy at the expense of higher computational requirements during training and execution. In this paper we present a novel ensemble algorithm called Multi-K which uses unsupervised clustering as a form of dataset preprocessing to create component models that lead to effective and efficient ensembles. We also presen...

متن کامل

Comparative Analysis of Machine Learning Algorithms with Optimization Purposes

The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches‎. ‎Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data‎. ‎In this paper‎, ‎a methodology has been employed to opt...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Information Technology and Computer Science

سال: 2021

ISSN: ['2074-9007', '2074-9015']

DOI: https://doi.org/10.5815/ijitcs.2021.06.05